{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_13_04_retrain.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# T81-558: Applications of Deep Neural Networks\n",
    "**Module 13: Advanced/Other Topics**\n",
    "* Instructor: [Jeff Heaton](https://sites.wustl.edu/jeffheaton/), McKelvey School of Engineering, [Washington University in St. Louis](https://engineering.wustl.edu/Programs/Pages/default.aspx)\n",
    "* For more information visit the [class website](https://sites.wustl.edu/jeffheaton/t81-558/)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Module 13 Video Material\n",
    "\n",
    "* Part 13.1: Flask and Deep Learning Web Services [[Video]](https://www.youtube.com/watch?v=H73m9XvKHug&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_13_01_flask.ipynb)\n",
    "* Part 13.2: Interrupting and Continuing Training  [[Video]](https://www.youtube.com/watch?v=kaQCdv46OBA&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_13_02_checkpoint.ipynb)\n",
    "* Part 13.3: Using a Keras Deep Neural Network with a Web Application  [[Video]](https://www.youtube.com/watch?v=OBbw0e-UroI&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_13_03_web.ipynb)\n",
    "* **Part 13.4: When to Retrain Your Neural Network** [[Video]](https://www.youtube.com/watch?v=K2Tjdx_1v9g&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_13_04_retrain.ipynb)\n",
    "* Part 13.5: Tensor Processing Units (TPUs)  [[Video]](https://www.youtube.com/watch?v=Ygyf3NUqvSc&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_13_05_tpu.ipynb)\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Part 13.4: When to Retrain Your Neural Network\n",
    "\n",
    "Dataset drift is a problem frequently seen in real-world applications of machine learning. Academic problems that courses typically present in school assignments usually do not experience this problem. For a class assignment, your instructor provides a single data set representing all of the data you will ever see for a task. In the real world, you obtain initial data to train your model; then, you will acquire new data over time that you use your model to predict.\n",
    "\n",
    "Consider this example. You create a startup company that develops a mobile application that helps people find jobs. To train your machine learning model, you collect attributes about people and their careers. Once you have your data, you can prepare your neural network to suggest the best jobs for individuals. \n",
    "\n",
    "Once your application is released, you will hopefully obtain new data. This data will come from job seekers using your app. These people are your customers. You have x values (their attributes), but you do not have y-values (their jobs). Your customers have come to you to find out what their be jobs will be. You will provide the customer's attributes to the neural network, and then it will predict their jobs. Usually, companies develop neural networks on initial data and then use the neural network to perform predictions on new data obtained over time from their customers.\n",
    "\n",
    "Your job prediction model will become less relevant as the industry introduces new job types and the demographics of your customers change. However, companies must look if their model is still relevant as time passes. This change in your underlying data is called dataset drift. In this section, we will see ways that you can measure dataset drift.\n",
    "\n",
    "You can present your model with new data and see how its accuracy changes over time. However, to calculate efficiency, you must know the expected outputs from the model (y-values). You may not know the correct outcomes for new data that you are obtaining in real-time. Therefore, we will look at algorithms that examine the x-inputs and determine how much they have changed in distribution from the original x-inputs that we trained on. These changes are called dataset drift.\n",
    "\n",
    "Let's begin by creating generated data that illustrates drift. We present the following code to create a chart that shows such drift."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "-1.1979470956001936\n",
      "0.9888340153211445\n"
     ]
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYIAAAEWCAYAAABrDZDcAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAgAElEQVR4nOydeVxV1RbHv5vLLAiKQ6YCJs4iiOaQWjmVZqZiOYQ5h9rLstmelWnRazJNG309RzA100yzTE1ynnDCWRxwKCdUBEHG/f643BvDvXAv3ANc3N/Ph4+cc/Y5Z3GFs87ea63fElJKFAqFQnH34lDWBigUCoWibFGOQKFQKO5ylCNQKBSKuxzlCBQKheIuRzkChUKhuMtRjkChUCjucpQjUChsiBDiGyHE27m2xwkhLgshkoUQPmVpm0JhDqHqCBQK0wghzgI1gUwgCzgCLABmSymzLTjfCbgFtJNSHsjZJ4EGUso4rexWKKxFzQgUisLpLaX0BPyAD4E3gP+ZGiiE0OXbVRNwBQ5raqFCUUIcy9oAhcIekFImAj8LIS4BO4QQ04BXgVT0TuIhoI8QYghwAf3MYV/O6TeFELv45+/tQM7MYJSUcklp/hwKhSmUI1AorEBKuUsIcQHolLPraeAx4HHAGRiSM+6EEKIZcAbwllJmgnFpKEgtDSnKE8oRKBTW8xdQNef7lVLKrTnf3xFClJFJCkXxUTEChcJ6agPXc74/X5aGKBS2QDkChcIKhBD3o3cEW3J2qbQ7hd2jHIFCYQFCiMpCiMeBxUCklDK2mJe6DNxnO8sUipKjYgQKReGsEkJkAtno6wg+A74pwfXeBeYLIdyAcCnl0pKbqFCUDFVQplAoFHc5amlIoVAo7nI0cwRCiDlCiCtCiENmjgshxEwhRJwQ4qAQIkQrWxQKhUJhHi1nBPOAHoUc7wk0yPkKB77W0BaFQqFQmEEzRyCl3MQ/udam6AMskHp2AN5CiFpa2aNQKBQK05Rl1lBt8hbjXMjZ93f+gUKIcPSzBlxcXFr5+fmVioH5SclIMXvM3cnd7LHs7GwcHOwnHGNv9oKyuTSwN3tB2ZybEydOXJNSVjd1rCwdgalafJMpTFLK2cBsAGdnZ7llyxaqVzf582iK/wx/4hPjC+z38/Lj+ITjZs+Ljo7m4Ycf1tAy22Jv9oKyuTSwN3tB2ZwbIUTBh1cOZekqLwB1c23XQa/hUigZGRn06dOHO3fuaGaYOSK6RhR483d3cieia0Sp26JQKBS2oiwdwc/A0JzsoXZAopSywLKQKbZv387w4cPJzi6yN4hNCQsMY3bv2fh5+SEQ+Hn5Mbv3bMICw0rVDoVCobAlmi0NCSG+Bx4GquXI9k4GnACklN8Aa9DL98YBKcAIa66/ZMkS6tevT0RE6b6NhwWGqQe/QqGoUGjmCKSUg4s4LoF/WXtdb29vbt68CcAHH3xAQEAAI0ZY5UMUCkUxEEJw5syZMlmWLS5eXl4cPXq0rM2wipLa7OrqSp06dXBycrL4HLvTGqpRowYPPPAAa9asASA8PBw/Pz+6dOlSxpYpFBWbSpUq4enpib+/P/bSdyEpKQlPT8+yNsMqSmKzlJKEhAQuXLhAvXr1LD7PvvKqcli8eDEtWrQAIDMzk/79+3Ps2LEytkqhqNjodDp8fHzsxgncjQgh8PHxsXrWZpeOwNPTk19++YVatfT1Zzdv3qRXr15cvXq1jC0rPlGxUfjP8MdhigP+M/yJio0qa5MUigIoJ1D+Kc7/kV06AoA6deqwevVq3N316ZynT5+mb9++drV+aSAqNorwVeHEJ8YjkcQnxhO+Klw5A4VCUSrYrSMACAkJ4fvvvzd6wG3btjFixIhSTystKZM2TCpQtZySkcKkDZPKyCKFovyRkJBAcHAwwcHB3HPPPdSuXdu4nZ6ebtE1RowYwfHj5os/Ab788kuiomzzEtaxY0caNWpEixYtaNy4MS+88AKJiYmFnpOdnc2HH35ok/tbil07AoAnnniCzz77zLi9ePFiJk+eXIYWWc+5xHNW7Vco7kZ8fHzYv38/+/fvZ+zYsbz00kvGbWdnZ0AfLC3sRXDu3Lk0atSo0Pv861//IizMdiniS5Ys4eDBgxw8eBAHBwdCQ0MLHa8cQTF58cUXee6554zb77//PvPmzSs7g6zE18vXqv0KhT1QWnGvuLg4mjdvztixYwkJCeHvv/8mPDyc1q1b06ZNG6ZOnWoc27FjR/bv309mZibe3t5MnDiRoKAg2rdvz5UrVwB46623mDFjhnH8xIkTadOmDY0aNWLbtm0A3L59m/79+xMUFMTgwYNp3bo1+/fvL9ROZ2dnPv30U06ePMnhw4cB6N27N61ataJZs2Z89913AEyePJmkpCSCg4MZOnSo2XG2pEI4AiEEn3/+OT179jTuCw8PZ+PGjZrf2xa/7Eq6QlHRKO2415EjRxg1ahT79u2jdu3afPjhh+zZs4dt27axbt06jhw5UuCcxMREHnroIQ4cOED79u2ZM2eOyWtLKdm1axeffPKJ0anMmjWLe+65hwMHDjBx4kT27dtnkZ2Ojo60aNHCmOU4f/58YmJi2L17N5999hk3btxgypQpeHp6sn//fhYsWGB2nC2pEI4A9B/wkiVLjGmlGRkZhIaGappWaqtfdiVdoaholHbcq379+tx///3G7e+//56QkBA6derE0aNHTToCNzc348tjq1atOHv2rMlrG5Zyco/ZsmULgwYNAiAoKIhmzZpZbGvu9sDTp083zkguXLjAqVOnTJ5j6bjiYncFZYXh6enJ6tWradu2LX///bcxrXTHjh2aqJUW9sue+yF+PfU6/jP8OZd4Dl8vXyK6RhR4yCvpCkVForTjXpUqVTJ+f/LkST7//HN27dqFTqdj3LhxJrMJDXEF0NdIZGZmmry2i4tLgTHF7fWemZnJoUOHaNKkCevXr2fTpk3s2LEDNzc3OnbsaNJOS8eVhAozIzBQt25dVq1aVSpppZb8skfFRhGfGG/TKbKqOVCUd8oy7nXr1i08PT2pXLkyly5dYu3atTa/R8eOHVm6dCkAsbGxJmcc+UlPT+eNN94gICCApk2bkpiYSNWqVXFzc+Pw4cPs3r0b0K9uAEanY26cLalwjgD0U7hFixZpnlZqyS/7pA2TyJZ571uSKbKqOVDYA2UZ9woJCaFp06Y0b96c8ePH06FDB5vfY/z48Vy8eJEWLVowbdo0mjdvjpeXl8mxAwcOpEWLFgQGBpKens7y5csB6NWrFykpKQQFBTF16lTatm1rPGfUqFG0aNGCoUOHFjrOZkgp7eqrYcOG0lI+++wzib7ZjQTkW2+9ZfG5lhB5MFK6R7hL3sX45R7hLiMPRhrHiHeF/HTRp3nG8C5SvCuKdU+/6X4FrsW7SL/pfjb6qaTcuHGjza5VWiibtWfv3r1WjY88GCn9pvtJ8a6QftP98vxdlBa3bt3S5LoZGRkyNTVVSinliRMnpL+/v8zIyLDJtW1h85EjRwrsA/ZIM8/VChUjyM+ECRM4efIkX3/9NaBPK61fvz7Dhw+3yfUNa/qTNkwyu/5v6ymyqjlQ2AsVOe6VnJxM165dyczMRErJt99+a1zSsUfs13ILEEIwc+ZMzpw5w2+//Qbo00r9/f1L3AouKjYqjwNYGLrQ5C99RNcIrhy+kmdfSabIvl6+JttlqpoDhaL08Pb2JiYmpqzNsBkV2hHAP2mlnTp14uDBg2RkZNCvXz+2b99O48aN84xNSE5jT/wNziWkcPFmKqnpWQC4OeuoU8UNP59KtParwprTPxC+KtyYMWRYpwdMZgMtv7AcPy+/QrOGLCWia0See4OqOVAoFCWjwjsCgMqVK5tNK011qMSPey/w66G/OXE52XiOh4sjHi76jyfpTga3c5yCECB1yTjyODrdRrIc9IqnptJGDVR1q8rZCWdt8rNYshylUCgU1nBXOAL4J630wQcfJCUlhQt3nOg4aRFpVQMQAtrWq8rrPRrRtl5VAmp44uX2T3cfKSWJqRmcuJzMrjMJvL/+MFWyh+KdOYRUh50kOi0m3eGUxev0+ZeVrH2QV+S1V4VCUfrcNY4A9GmlM+cs4t/L9uFW/35SUhLxu3mA7z94kbpVK5k9TwiBt7szbepVpU29qnx68Csu3rhDpazueGY+Tq209tx22Eoln1+LtMGQ/mnJspJCoVCUBhWyjsAUmVnZzNpwko9jnfEKCOFG9FwufjOKrd9OYvZ065T+IrpG4OySRKJTJBddR3LTMQr37NY433iP/246TVa2+apDJTmtUBQPW8hQA8yZM4dLly4Zty2RpraEzMxMdDodwcHBNGvWjODgYGbMmFFk/dLp06dZvHhxie9fEu4KR3DhRgoDZ+9g2roTdGtSkz/f6E5YSA1khr7a+P3332f+/PkWXy+3NhAiFe9q23itTxqdAmoSseYoQ+fs5Mot05XMKv1ToSgelshQW0J+R2CJNLWlGMTiDh8+zNq1a1m5ciUREYUncihHUAr8eeIqj32+meOXkvh8UDBfhoVQy9uNmTNn5lErffbZZ4mOjrb4umGBYZydcJbsydmcnXCW59sP5rthrfmofyAx8Tfo+flmdp5OKHCekpxW3DVERYG/Pzg46P+1UbMXU8yfP582bdoQHBzMc889R3Z2NpmZmTz77LMEBgbSvHlzZs6cyZIlS9i/fz8DBw40ziQskaY+efIkbdu2pU2bNrz99tt4e3sXaVPNmjX59ttvmTVrFgCnTp2iU6dOtGzZklatWrFz504AJk6cyMaNGwkODmbmzJmcPn3a5DgtqdCOYMH2s4yYu4vaVdz55YWO9AmubTzm6OjI4sWLCQwMBDCmlZZkiiiEYOD9vqwe3xEvdyeG/G8nP+w5n2eMkpxW3BVERUF4OMTHg5T6f8PDNXEGhw4dYsWKFWzbts34QF+8eDExMTEkJCQQGxvLoUOHGDp0qNEBGBxC/pmEOWnq8ePH8+qrr7Jr1y5q1qxpsW0NGzYkNTWVhIQEatWqxbp169i3bx9RUVG88MILAHz44Yd07tyZ/fv388ILL3DPPfeYHKclFdIRSCn5YM1R3ll5mM6NarBsbHv8fAoGgytXrswvv/zCPffcA8DNmzd57LHHuHr1aonuH1DDkxXjOtCmXlVeW3aQn+LSkVIas4VSMlLQCR2AkpxWVEwmTYKUvLEwUlL0+23M+vXr2b17N61btyY4OJg///yTU6dOERAQQFxcHC+++CJr1641qwWUG3PS1Dt37qR///4APP3001bZJ3OUStPS0hg1ahTNmzdn0KBBZoXqLB1nSyqcI8jOlry98hCzN53mmXZ+zB7amkou5pOj6taty+rVq22mVmpQBq3yiQtbk56m5X3p/BSXwdAFywn/OdxYFZwls4wzAeUEFBWOc2ZiXub2lwApJSNHjjTGC44fP87bb7+Nj48P27Zto2PHjsycOZMxY8YUeS1Lpakt5cSJE7i7u+Pj48O0adOoW7cusbGx7Nq1i7S0NJPnzJo1y6JxtqRCOYLsbMnE5QeJ3HGOMQ/dx9Q+zdA5iCLPs5VaaQFl0Ftn+P3qM3S4N4vNR11xSQnTy9/loLKFFBUWXzMxL3P7S0C3bt1YunQp165dA/TZRefOnePq1atIKXnqqaeYMmUKe/fuBfQB3aSkJKvu0aZNG1asWAFgcWD3ypUrjBs3jvHjxwP6ZadatWohhGD+/PnGmUJ+e27dumVynJZUGEcgpSRs/gqW7rlAouNivj3Wk0WHFll8fp8+ffj000+N24sXL2by5MlW2WAyNTTzNm1rxnNLt5LKWX3wyhyS57jKFlJUSCIiwD1vLAx3d/1+GxMYGMjkyZPp1q0bLVq04JFHHuHy5cucP3+enj17EhwczLPPPssHH3wA6NNFR48ebVXa6cyZM/noo49o06YNV65cMbvMZOg13LRpUx555BEef/xxJuUshz3//PN89913tGvXjvj4eGPDm5YtW5KVlUVQUJBx5mJqnJaI0vA2tqRRo0bSVEB3RNSPbIx15ZbuJ244fQdCH4S1Zv1dSslzzz3HN998Y9w3f/58YwPponCY4oCk4Of5acNPmXVpFknXnsAz61FuOP6PW076tws/Lz+byU/Yiujo6BKL8pU2ymbt2bdvHy1btrT8hKgofUzg3Dn9TCAiAsJKdxk0KSkJT0/PEl/n9u3buLu7I4QgMjKSFStW8OOPP9rAwoLYwuajR4/SpEmTPPuEEDFSytamxleIGcHiXefYGOtKsu53oxMA65dehBDMmjWLRx991Lhv9OjRFqeVmksBddY5E9Etgjvuc7it20SVzFG4Z3YymS1UnO5jqmOZolwSFgZnz0J2tv7fUnYCtmT37t20bNmSFi1a8N///pdPPvmkrE2yKXYvMbE17hpv/XSIVIcYEpy+MDoBA9YuvTg6OrJ06VI6duxIbGwsGRkZhIaGsn379iKLTswpg9b2rE1ooL4B9qT1k7lz1YfqGS8zpvMdwgIHGccWR35CSVYoFNrz8MMPs3///rI2QzPsekZw+moyYyNjqF/dA7dqi0AUDO4Wp1Arf1rpjRs36NWrlzEYZY7cFccCYUwNrepW1Xj87EtxHJk0AT8fL37Y6sOFG/84jeLITyjJCoVCUVLs1hEkp2UyZmEMTjoH/je8NRHd37ZpoZZBrdTNzQ3QVwX26dOnyLTS/BXHJmWpKzkzZ/j9ZGRmMzYyhjsZeonr4shPKMkKhUJRUuzSEUgpeX3ZAU5dTeaLwS2pU8Xd7Nt4SZZHWrduXSCtdOTIkTZJ56pf3YPpA4M5dPEWb/90CCllseQnlGSFQqEoKXbpCP635QxrYi8xsWdjHgioZtxvydu4tfTt2zdPYOj777+3Oq3UHN2a1uSFLgH8EHOBJbvPF0t+QklWKBSKkmJ3jiAtCz767RiPNqvJs53uK5V71uxeE48HPIzb7733nlVqpYXxYreGdGpQjXdXHeb+Gk/8o2oK6ITOuN5vLhNIi5mQQlEesYUMta0kpwHq1KnDzZs3Te7v3Llznn3NmzcnODjYqusbxPBKOsYSNM0aEkL0AD4HdMB3UsoP8x33BeYD3jljJkop1xR2zaup2QR7uvJx/yDjko2WRMVGMWb1GFK6psBl4JR+/6jRo/Dz8ytxHrjOQTBtQBCPfb6Z5xft5efn9VlE1mQCqY5lirsBgww1wLvvvouHhwevvvpqnjFSSqSUODiYfsedO3eu5naCXrfsr7/+4t577yU2NhZHx/KdoKnZjEAIoQO+BHoCTYHBQoim+Ya9BSyVUrYEBgFfFXXdzGz4fFAwXu5ORQ21CcasHB3wFFBDvz8rM4vQ0FCbvF3U8HTlswHBnLiczPu/HFGZQAqFFcTFxdG8eXPGjh1LSEgIf//9N+Hh4bRu3Zo2bdowdepU41hLJKcvX75MaGio8fwdO3YAcPXqVbp3705ISAjjxo0rNFb41FNPsXTpUkC/nDx48GDjsdTUVIYNG0ZgYCAhISFs2rQJgJSUFJ566inat2/PoEGD8iSm/Prrr7Rv356QkBAGDhzI7du3bfcBou3SUBsgTkp5WkqZDiwG+uQbI4HKOd97AX8VddGqroLW/lVtamhh5Mm+cQWeBnJWiSxNK7WEBxtW59lO9YjccY4r131MjolPjFcFY4pygRBCs6/icOTIEUaNGsW+ffuoXbs2H374IXv27GHbtm2sW7fOpIKnOcnpF154gddff509e/awdOlSRo8eDcDkyZPp3Lkze/fupUePHvz1l/nH1VNPPcWyZcsAWLNmDb169TIemzlzJs7OzsTGxrJw4UKeeeYZ0tPT+eKLL6hSpQrbt2/njTfeYN++fYBes+jDDz9kw4YN7N27lxYtWvD5558X63Myh5bzldpAbjH+C0DbfGPeBX4XQowHKgHdTF1ICBEOhANUr17dqgYyJeXzJp+TnpV3/fF8tfN8NfUrMtIzOHXqFJ07d2batGlmuyQlJydbZPP9rpI1HoJKae/w9H2ncXU0LXp35fAVll9YbqxPsDWW2lueUDZrT+XKla0WaysOltwjLS0NJycnkpKSSE5Opl69ejRu3Nh47ty5c1m4cCEZGRlcunSJmJgY6tatS1ZWFrdv3yYpKQk3Nzc6duxIUlISTZs2Zfv27SQlJbFu3TqOHj1qvNf169e5cuUK0dHRLFu2jKSkJLp06WIUi9PpdHlsk1Li6uqKi4sLc+fOpUmTJmRmZpKdnU1SUhLR0dG8+OKLJCUl4evrS82aNTlw4AB//PEHEyZMICsri4CAAJo0acLt27fZsGEDhw8fpl27dgCkp6fTvn17kpKS8vw8ublz545Vv1taOgJTrj3/XGowME9KOU0I0R5YKIRoLqXM8wSUUs4GZoNea6g09Vkuxl40WS38r4//xecvfY6UkkOHDjFv3jyioqJMvtFYoylzb+NEnvhiM7OOX+Wy03/MjtNSo8jeNHBA2Vwa7Nu3zya6PUVhyT1cXFxwcXHB09MTDw8PPD09jeedPHmSb7/9ll27dqHT6Rg3bhxCCDw9PdHpdFSqVAlPT0+cnZ2N53h4eBjHAOzZs6fAi52Dg0Oe+xhszW+v4TphYWG8+uqrREZG4uHhYTxfp9Ph7u5uPM9gk6OjI5UqVUKn0+Hp6YmDgwOVKlXC1dWVnj17snDhwgKfQ+6fJzeurq5W6UJpuTR0Aaiba7sOBZd+RgFLAaSU29EvvlSjjMmt3TNpwySGBQ0rkJUz/cXpNkkrza8TdOD6aiZ0a4RrZgf8XHqbPU8VjCnKEkNQVouvknLr1i08PT2pXLkyly5dYu3atVad361bN7788kvjtiFA/eCDDxKV02Ft1apVRc5c+vfvz+uvv0737t3z7M99naNHj/L3338TEBCQZ/+BAwc4fPgwAA888AB//vknp0+fBvQCeCdPnrTqZyoKLR3BbqCBEKKeEMIZfTD453xjzgFdAYQQTdA7gpK1ByuCogTaCvQUSIxn/oH5RHSNKFCf8PLLLzN27Fjjue+99x4LFiywypb89wpfFY53tV00qVWZGlkv4OfZ2OS5qmBMoTBNSEgITZs2pXnz5owfP54OHTpYdf6XX37J1q1badGiBU2bNuW///0vAFOmTGH9+vWEhIQQHR1N7dq1C72Ol5cXb7zxRoGMofHjx5OamkpgYCBhYWEsWLAAZ2dnnn/+eRISEmjfvj3Tp0+ndWu9UGjNmjX53//+x8CBAwkKCuKBBx7gxIkTVv1MRaGpDLUQ4jFgBvqcmzlSygghxFRgj5Ty55wsov+iD79K4HUp5e+FXdOcDLUl5Bdog4JS1f4z/I1dxHJjbikmMzOTxx9/3PjW4eTkxLp163jooYeMrSnH1xzPrMuzCnQjK+xeq586QJ8vtxJc7w6/XX2mUJttjb0tWYCyuTSwWoa6HGArGerSpMLJUEsp10gpG0op60spI3L2vSOl/Dnn+yNSyg5SyiApZXBRTqCkmEvLHLZimHGGYOrBDOaXYgxqpc2bNwcgIyODfv368emqT41v+/BPLUDuGUhhOkHNa3vxbKf7iDnlwsT756iCMYVCoRl2V1lcEsw9eLNklnFpRpiMcRe+FGNKrfTNkW+ScrPwWoCidIImdGuAb1V3Nh6syfHnT9lUOkOhUCgMVEhHYC4OYMm6ukQWcAaWaPf4+vrmUSvNvJYJS4B8va9zO6OidIJcnXRM7dOM01dvM/vP00XabgrVtEahUBRFhXME5gKwUbFRJh+8ppDIYi3FtG7dOm8K6TlgJXkyIXI7I0t0gh5uVINeLWoxa2Mc8QnWVRMW9lkoFAqFgfItgFEMCpNnMAR7J22YxLnEczgIB7JkVoFrlCRHv1+/fnzyySf/aKDEwu8//g4tTM8sLNEJeufxpvx5/CpTVh1hzvD7LbalsM9CLS8pFAoDFW5GUFSjltxS1fP7zddEwvnll19mzJgxxu11y9fhc8Kn2EHempVdebFrA/44doUNRy9bfJ5qWqNQKCyhwjkCaxq1aCXhLITgiy++4NFHHzXuu/XDLereqFvIWYUzvIM/ATU8mLLqiLGjWVFY+lnkjyNcT71ebDsVCq2whQy1Jaxfvx4vLy/jtXP/HduCvXv38ttvvxm3V6xYkac4tSyocI7A2kYtWjSzAX1a6ZIlS/Kklfbt27fYhSBOOgfe7d2Mc9dT+O8mywLHlnwWpuII8YnxKo6gKHcYZKj379/P2LFjeemll4zbBjkIKSXZ2aY1uqyhc+fOxmtbW5lcFPkdQb9+/Xjttddseg9rqXCOwNxbPlDq2TNeXl788ssvVK2qF4crqVppxwbV6NHsHr7+8xSXbxXeOxksm/GYiiNky2wlea2wG/LLUJ8/fx5vb2/j8cWLFxsVRM1JTFvCkCFD+Omnn4zbHh56GeL169fTtWtXQkNDadSoEUOHDjWO2blzJ+3btycoKIi2bdty+/Ztpk6dSlRUFMHBwSxbtozvvvuOCRMmAHDmzBl69epFixYt6N69OxcuXDDe+8UXX+SBBx7gvvvuY8WKFcX/wExQ4YLFUDAAm7+iuKhGL7bE19eXDz74gJdeeonU1FTi4uLo168f69evx8XFxerr/fuxJvzx2RU+WXucT58KKnJ8UcFoFUdQFIcpqw5z5K9bNr1m03srM7l3s2Kde+TIEebOncs333xDZmam2XEGiel27dpx9uxZHn/8cQ4dOlRg3MaNG40dxQYNGsTEiRMLvf/evXs5cuQINWrUoF27duzYsYPg4GAGDRrEjz/+SEhICImJibi6uvLOO+9w6NAhZsyYAcB3331nvM5zzz3H0KFDGT16NLNnz2bChAlGOesrV66wdetWYmNjGTBgAP369bP6czJHhZsRmKKsG700atQoT1rpli1bGDlyZLEEtnx93BnZsR7LYi4QeyGxxLZZE1NRKMor9evX5/77i86oW79+PWPHjiU4OJi+ffty48YNUlNTC4zLvTRUlBMAaNeuHbVq1UKn0xEcHMzZs2c5evQovr6+hISEAPoVgvyS1fnZuXMnTz75JABDhw5l8+bNxmN9+/ZFCEGLFi24ePFikTZZQ4WcEeSnPLz19uvXj48//ti4Frho0SICArP8jagAACAASURBVAKYMmWK1df6V+f6LIs5z9TVh1k6pn2xm3lExUaRnJ5cYL+DcChx5pSiYlPcN3etqFSpkvF7BweHPC9ZuTt9SSnZtWuX2d4hheHo6GiMP2RlZeWZeeSe3et0OjIzM5FS2rSdbu572Foj7q6YEZTGW68lFbyvvPJKnrTSqVOnmtQYLwpPVyde6t6Q3Wdv8PsRy9NJ89sbviqchNSEPPt93Hzw8/JTdQYKu8XBwYEqVapw8uRJsrOz86ynm5OYtgR/f39iYmIAfaZPVlbh2XvNmjUjPj6evXv3Anp57KysLGNDG1O0a9eO5cuXAxAZGcmDDz5osX0l4a5wBNZmElmLpRW8QghmzZrFI488Ytw3atQoY89SaxjYui4BNTz46NdjZGRZnyVharkMwMPZQ7POZwpFafHRRx/Ro0cPevfuTZ06dYz7zUlMW8KYMWNYt24dbdq0Yf/+/UXG+FxcXPj+++8ZN24cQUFBPPLII6SlpdGlSxcOHDhAy5Ytjev/Br744gvmzZtHixYtWLJkCdOnT7fuBy8mmspQa0FxZagNktDnEs/h6+VbQBK6JBQlXZ1fbjgxMZGOHTsag1RVq1Zl+/btNGzY0Kr7bjh6mVHz9zC1TzOGtve36lyHKQ7IAg3jQCD446E/7EoeGexP0hnsz2YlQ106VDgZ6vKEVvUCYH0MwsvLi9WrV1OzZk1A3xO1OGmlXRrXoN19VZmx/iRJdzKsOlcFiRUKhYG7xhFoSXEeqn5+fnnUSg1ppWlpaRbfVwjBvx9rwvXb6fx38xmrbNZ6uUyhUNgPyhHYgOI+VO+//34iIyNLlFbaoo43vQJr8d3m01xNstyJaCWvoajY2NtS8t1Icf6PlCOwASV5qIaGhvLxxx8btxctWmR1SukrjzQkLTObL/6wrKG1IcPpmeXPALAwdKFqeKMokqysLBISEpQzKMdIKUlISMDV1dWq8+y6jkDLALC1WCInbY5XXnmFkydPMnu2XgpjypQpBAQEMGTIEIvOv6+6BwPvr0vUznOM7FgPP59KZseWZZW1wr65ffs2SUlJXL16taxNsZg7d+5Y/VAsa0pqs6ura55MKUuwO0eQlZ1lzNIRCGPmiz0/0AxqpWfPnuX33/Vtm0eOHImvr6/FecQTujZg+d4LzFh/kukDg82OUz0KFMVFSkm9evXK2gyriI6OtrtMp7Kw2e6WhtKy0oypmvnTH0tTNsLWODk5sXTpUpo101dsZmRk0K9fP06etGy5p0ZlV4Y94M9P+y9y/JLpYhUoH1XWCoWifGF3jqAo7PmBZlArzZ1W+thjj5GQkFDEmXrGPlgfD2dHpv1uvs5CpY0qFIr8VDhH4CAc7FpL31Raad++fS1KK61SyZnwB+/j9yOX2X/+pskxKm1UoVDkp8I5giyZZfcN2k2llY4ePdqibI1KVXaBSOLRr2eZ1DxSaaMKhSI/Fc4RgH3HCgyEhoby0UcfGbcjIyOZOnVqoedExUYx/rdwruuW4pYdwqXrHiadopZV1gqFwv6wS0egE3pNbz8vP7Nj7DlWYODVV1/l2WefNW6/++67REZGmh1vyAhKdlxDJtfxzhxCSrr9O0WFQqEtdukIsmSWcV3bnDOoCMFPIQRffvkl3bt3N+4bNWpUnmYVuTE4PynSuOW0FNfsQFyzgyqEU1QoFNphl44A/ln+qejBTycnJ3744QdjWml6ejp9+/Y1mVaa2/kl6X4jU1zFK3MIvpXt3ykqFArtsFtHAPo34Lsh+GlKrdRUWmkepygySXRcimt2E4Y3/Sj/JRUKhcKIXTsCwxvw3RD89Pf35+effy5UrTS/U/Spehxv92z2xdVR+jAKhcIsdusIKtLyj6W0adMmT2vLzZs3F0grzeMUXzrF648GceD8TaKP248+jEKhKF3s0hFUxOUfS+nfv79VaaVPtqpDbW83pq8/oWYFCoXCJHbnCNyd3Cvs8o+lvPbaawXSSqOiTBfQOTs6ML5LAAcvJKpZgUKhMIndOQLFP2mlzds3N+4bMmwIb8972+T40JA61KnixgwbzwoMfQ0cpjiYrGJWKBT2gaaOQAjRQwhxXAgRJ4SYaGbMACHEESHEYSHEIi3tqUgsPbaUU91OQfWcHVnw/nPvM231tAJjnR0d+FfnAA5cSCT6hG1mBYa+BvGJ8UikUQZcOQOFwv7QzBEIIXTAl0BPoCkwWAjRNN+YBsCbQAcpZTNgglb2VDQmbZhEqi4VwgBDH5pUmDhyokm10v4h+ljBjPUnbTIrKKyvgUKhsC+0nBG0AeKklKellOnAYqBPvjHPAl9KKW8ASCmvaGhPhcJYLewNDMbYYijzaiahoaEF1EqNs4LzN/nTBrMC1ddAoag4CK0ySYQQTwI9pJSjc7afAdpKKZ/PNeYn4ATQAdAB70opfzNxrXAgHKB69eqtli5dqonNWpGcnIyHh4dNrxl7JZb0rHTj9sFdB1kwY4Fxu3v37rz55ptGBVOAzGzJG5tS8XYRvNXONc8xa+3Nf38DzjpnAmsEWvvjlBgtPmOtsTeb7c1eUDbnpnPnzjFSytamjmnZqtLUUya/13EEGgAPA3WAzUKI5lLKPGL6UsrZwGyARo0ayYcfftjmxmpJdHQ0trb5YuzFPL2H8QanHk5k/JYBwLp16+jQoQOTJ0/Oc95L7vG89dMhHOs0p1OD6vkva7G9Be6PPqNrdu/ZPBxY+LlaoMVnrDX2ZrO92QvKZkvRcmnoAlA313Yd4C8TY1ZKKTOklGeA4+gdg8IEubN0Jm2YxLCgYXmkNeZ8NIfw8HDjeFNppU+1rkMtL1c+L2Gs4G6Q9lAo7ha0nBHsBhoIIeoBF4FBwNP5xvyEfoV7nhCiGtAQOK2hTXaLIUvH8AYenxjP/APzCzx8B34xkDNnzrBu3ToARo4cia+vL506dQLAxVHHuIfr887Kw2w/lcADAdWKbVNYYJh68CsUFQDNZgRSykzgeWAtcBRYKqU8LISYKoR4ImfYWiBBCHEE2Ai8JqW0rEHvXYalWToGtdKmTfUJWqbUSge0rksNTxdm/lFQwVShUNx9aFpHIKVcI6VsKKWsL6WMyNn3jpTy55zvpZTyZSllUylloJRysZb22DPWZOl4eXnxyy+/UKNGDUCvVtqrVy9jWqmrk46xD9Vnx+nr7Dyt/K5CcbejKovtBHONdsztN6iVurq6AnDy5Mk8aqWD2/hSzcOFWX/EaWOwQpGfqCjw9wcHB/2/ZmRRFKWPcgR2QnEa8LRt2zZPa8vcaqVuzjrGPHgfW+KuERN/QzO7FQpA/9APD4f4eJBS/294uHIG5QTlCOyE4mbpmFIrfe+99/TXbOdL1UrOzNygYgUKjZk0CVLyxrhISdHvV5Q5WmYNKWxMcbN0XnvtNU6ePMl3330HwOTJkwkICODpp59mdKd6fPzbcQ6cv0lQXW9bm6xQ6DlnpuLc3H5FqaJmBHcBQgi++uorunXrZtw3YsQItmzZwtD2/ni5OalYgUJbfM30zTa3X1GqWO0IhBAOQojKWhij0A4nJyeWLVtWIK300vmzjOpYj/VHL3PoYmIZW6mosEREgHveGBfu7vr9ijLHIkcghFgkhKgshKgEHAGOCyFe09Y0ha3Jn1aakJDAY489xhONK+Pp4sgXFswKVA8CRbEIC4PZs8HPD4TQ/zt7tn6/osyxdEbQVEp5C+gLrAF8gWc0s0qhGf7+/qxatSpPWunQwU/xTLu6/Hb4EscvJZk9V/UgUJSIsDA4exays/X/KidQbrDUETgJIZzQO4KVUsoMCgrIKeyENm3asHDhQuP25s2b2fv9NCo565hVSLWx6kGgUFRMLHUE3wJn0bdA2SSE8ANuaWWUouQUtYTz5JNP5kkrXbJwDvWzL/BL7N/8lZxt8pqqB4FCUTGxyBFIKWdKKWtLKR/LkYWIBzprbJuimFi6hPPaa68xevRo4/Yvn72CI5JVpwv2GYiKjcJBmP51MVfdrFAo7ANLg8U1hRD/E0L8mrPdFBimqWWKYmPpEk7+tNLs1Fvc2L2SHX9lcubabeM4g2PJklkF7lVUdbNCoSj/WLo0NA+9Uui9OdsnUP2Fyy3WLOHkVyu9sX0Z2RnpfLhyr3GMKccCoBM61YPAXlA6P4pCsNQRVJNSLgWywSgxXfD1UFEusFagztvb25hWmp1yk6T9v7L2+A0OnNL3ETLnWLJltnIC9kBZ6fwo52M3WOoIbgshfMjJFBJCtANU9VE5pTgCdbnVSm/t/BGZnUXY+/OZFzNPxQbsHXM6P8OGafeQViJzdoWljuBl4GegvhBiK7AAGK+ZVYoSUVyBurZt27Jw4UKybt8gaf9vJFVrypjn/01WtooN2DXm9HyysrR7SCuRObvC0qyhvcBDwAPAGKCZlPKgloYpSkZYYBhnJ5wle3I2ZyectXgJ58knnyQ8PJxbO38EKfHwCoVNeceo2ICdYYmej60e0obloPh408eVyFy5pFD1USFEqJlDDYUQSCmXa2CToowZNGgQ2dnZ/HhgLZ7BPUicvZSsqlchUH9cxQbsjIgI/Rt//jf0/JT0IW1YDirsPkpkrlxS1IygdyFfj2trmsKWWKMRZEgrTbvyM0iJV7sB8BOQ85KnYgN2Rn6dH53O9LiSPqRNLQflRonMlVsKdQRSyhGFfI0sLSMVJaM4GkFOTk5M/2oyySd/x6NFN3Tu1WExuN5yVbEBeyS3zs/8+doogRY2o1Aic+Uai2WohRC9hBCvCyHeMXxpaZjCdhRXIyi8QzgvDm8CgFf7AZAK3j9607N2T81sVZQCWimBmptR+PkpkblyjqWVxd8AA9FnCgngKcBPQ7sUNqQkGkHv9BvPIwEe+llB5epcir9EaGgo6ekFZSgUdoStlUCjoiA5ueB+tRxkF1g6I3hASjkUuCGlnAK0B+pqZ5bCllhbYJafqYM64OjoiFf7gQD8+eefPPvss0ipBGgV/BMkTkjIu9/HRy0H2QmWOoLUnH9ThBD3AplAPW1MUtia4hSY5Q4ut5/bhNYN0vEKfhRHr5oALFiwgPfff19TuxXlmNxVw8OGmQ4Se3goJ2AnWOoIVgshvIGPgRjgDLBYM6sUNsXaArPrqdcLBJfX/PUCDg5w//C3jePeeecdvv/+e6tsUR3OKgD5q4azzKjNqJoBu6GoOoL7gfNSyvdytj2AWOAYMF178xS2IiwwzOLc/4tJFwsEl5OzLuLjGs1luvDgY6FsWqMvIRk+fDi+vr506NChyOsaspcM1zZkLxnsU9gJRaWJGlA1A3ZDUTOCb4F0ACHEg8CHOfsSgdnamqYoK9KzTAeCL2TPxdFBEDj4DZo00WcTpaen06dPH+Liiu53rDqcVRAsedNXQWK7oihHoJNSXs/5fiAwW0r5o5TybSBAW9MUZYWzztnk/jrengxt78eaw1f5OnI5NWrUACAhIYFevXpx/fp1k+cZMJelFJ8Yj8MUB6p9XI1qH1dTy0blHXNv+jqdakxvpxTpCIQQhuWjrsAfuY4VuqyksF9qe9Y2G1we+1B9XJ10/HAshZUrV+Lq6grAiRMnikwrLSxLSSJJSE0gITXB4qI3RRkREWG6IG3+fNWY3k4pyhF8D/wphFiJPnNoM4AQIgAlQ11hqepW1Wxw2cfDheEP+LP64N94+TVlwYIFxvOKSis1lb1UGGrZqJxS0oI01aeg3FHoW72UMkIIsQGoBfwu//kLd0DJUFdoCgsuj3mwPgt3xDPt9xN8N+wp/vOf//Dmm28C+rTSBg0a8NZbb5m8JuhjBecSzyEpug7BkqI3RRkQFla8t/78wnQGCWzDNRVlQpHpo1LKHVLKFVLK27n2nciRplbchXi5OzHmwftYf/Qye8/d4I033mDUqFHG42+//bbZtNLc8th+XkUXpyuBuwqG6lNQLrFYa0ihyM2IDvXwqeTMtN+PI4Tg66+/pmvXrsbjw4cPZ+vWrYVeo6ilItX8phQo7WUacxlHquagTFGOQFEsKrk48lznALbGJbA17hpOTk4sW7bMqrTS/IVuPm4++Lj5WNVVTVECyqKdpLmMI1VzUKYoR6AoNmFtfbnXy5WPfzuGlBJvb29++eUXqlevDliWVpp7qeja69e49vo1q7uqKYpJWSzTmMs4UjUHZYqmjkAI0UMIcVwIESeEmFjIuCeFEFII0VpLexS2xdVJx4TuDTlwIZHfDl0CoF69eqxcuRIXFxdAn1bav39/pVZaHimLZRqtJLAVJUIzRyCE0AFfAj2BpsBgIURTE+M8gReAnVrZotCO/iF1aFDDg09+P05mVjYA7du3z5NWGh0dTXh4uFIrLQ/kjgk4mPnz13qZxtYS2IoSo+WMoA0QJ6U8LaVMRy9S18fEuPfQi9nd0dAWhUboHASvPtqI01dvsyzmgnH/gAED+M9//mPcnj9/Ph988EFZmKgwYIlYnFqmuSsRWr2lCSGeBHpIKUfnbD8DtJVSPp9rTEvgLSllfyFENPCqlHKPiWuFA+EA1atXb7V06VJNbNaK5ORkPDw8ytoMi7HWXiklETvvcC1V8lEnN1wchXH/p59+ypo1a4xj33rrrTzZRWVlc3mg1G2OjYXCluicnaF2baha1eRh9RmXDlrZ3Llz5xgppenldymlJl/ou5h9l2v7GWBWrm0HIBrwz9mOBloXdd2GDRtKe2Pjxo1lbYJVFMfe3WcSpN8bq+WsDSfy7E9PT5ddunSRgASki4uL3LJli40s/QeTNkdGSunnJ6UQ+n8jI21+35JQ6r8XQkipnwvk/RLCotPt7fdYSmVzboA90sxzVculoQvk7WJWB/gr17Yn0ByIFkKcBdoBP6uAsX3S2r8qjzaryTd/nuZacppxvyGttHHjxgCkpaXRt29fTp06pa1BZZEaWd5RqZsKM2jpCHYDDYQQ9YQQzsAg4GfDQSllopSympTSX0rpD+wAnpAmloYU9sHrPRqTmpHFzA0n8+yvUqVKnrTSa9euWaRWWiJUBWtBVOqmwgyaKYhKKTOFEM8DawEdMEdKeVgIMRX9FOXnwq+gsDfqV/fg6Ta+RO08x9D2fgTU8DQeu++++1i5ciWdO3cmLS2N48eP079/f9auXYuzs2nZ6xKhKlgLYsjOmTRJ/zn4+uqdgAZZO0l3Mjh+KYkTl5M5dz2FCzdSuJqUxs2UDJLuZJCelU1mtsTRQeCkc2Bqn+Z0b1rT5nYoLENTKWkp5RpgTb5975gZ+7CWtihKhwndGvDTvot8sOYYc4bfn+eYIa104MCBwD9ppXPnzkUIYVtDfH31y0Gm9ld0oqLMP+yLKxZXBFeT0tgSd5Udp64Tc+4GcVeSjcecdIJ7vd2oWdkV/2rueLo64ezogKODIDNbkpmVTXVPF5vbpLAc1VNAYVN8PFx4vksA//n1GJtPXqVTg+p5jg8YMIBTp07x73//G9CnlQYEBJhUKy0RERF5VS7h7lgGKUV1z9NXk/n10CV+O3SJ2It6VXpvdydCfKvQJ+hemtSqTKN7PKnt7YaDg40dvcKmKEegsDnDO/gTuTOe91cf5ZcXfHDU5Q1FTZw4kbi4OObMmQPo1Urr16/P4MGDbWdEKS6DlCsKi43Y4GdPupPBT/v/YlnMBQ6cvwlAS19vXnu0EQ81rE7TWpXVQ98OUY5AYXNcHHX8u2cTxkXtZdGucwxt75/nuEGt9OzZs/zxh77p3YgRI/D19aVDhw62M0SjZZByjUaxkbgrySw4nMZzf2wgJT2Lxvd4MumxJjweVItaXm4lurai7FGOQKEJPZrfQ/v7fJj2+wl6t7iXKpXyBoSdnZ1ZtmwZDzzwAMeOHTOmle7YsYP69euXkdUVABvHRvafv8lXG+NYd/QyOgF9W9ZhSDs/gup42T6uoygzlPqowkhUbBT+M/yJ+TumxM3jhRBMfqIpSXcy+GzdCZNjzKWV3rhxo9j3tSuuX7d9LwAbpYjGXkhk5Lzd9P1yKzvPXGd85wA+e9idT58KIriut3ICFQzlCBSA3gmErwonPlH/NmmL5vGN76nMM+38iNoZz5G/bpkcY0grNaiVHj9+nNDQ0IqvVhoVpX9zt3XBWwnVPc8lpDD++330/mILe8/d4LVHG7F1YhdefqQRlZ2L+fBXPYrLPcoRKAB9H+GUjLxBRls0j3+5eyO83Z15e+UhsrNN61q1b9+e+fPnG7ejo6MZM2ZMxVYrnTRJr76ZG1sVvBVD3fN2WiYf/XaMbp/9ybojlxjfJYDNr3fmX50D8HApwQqyqvC2C5QjUADmm8SXtHm8l7sTb/ZsTEz8jTzqpPkZOHBgHnXSefPm5VEvtRhbvX1q/RZbTgrepJSsOvAXXaZF83X0KXoH3cufr3XmlUca4enqVPIbqApvu0A5AgVgvkm8LZrH9w+pQ2u/Knz42zFupphf8pk4cSIjR440bk+aNIklS5ZYfiNbvX2WxltsOdD9OX89hWFzdzP++33U8HRl+XMPMG1AEDUru9ruJuXE4SkKRzkCBWC6kbytmsc7OAje69ucxNQMPvz1mNlxhrTSzp07G/cNGzaMbdu2WXYjW719lsZbbEREwcYwpVTwlpUtmbPlDI9M30TM2eu827spP/2rAyG+VWx/s3Lg8BRFoxyBAsjbSB6wefP4JrUqM7pTPRbvPs/O0wlmxzk7O/Pjjz/mUSvt06ePZWqltnr7LI232LAwfSC3lFs2xifcZuC325m6+gjt7qvKupcfYniHeui0KgJTQnd2gXIECiOGRvKtarXSpHn8hK4NqVvVjTdXxJKWaaI7Vg7FTiu11dtnab3FVq1adFDXRrEKKSVRO+Pp+flmjl9OYtpTQcwZfj/3emtcDKZ6FNsFyhEoSg03Zx3v9w3k9NXbfLmx8Df8YqWV2urt09rraBVYtlGs4lpyGqPn72HSikOE+FZh7YQH6d+qTunVAqgexeUe5QgUNsNQkOYwxcFsQdpDDavTr2VtvtoYZ7a2wIBBrdRAkWmltnr7tOY6WgaWbRCr2HTiKj1mbGZz3DUm927KgpFttJ8FKOwO5QgUNiF3QZpEEp8Yz5DlQ6j2cbUCDmFy76Z4uzvz6g8HyMjKNnNFPQMGDCAi15t4kWmltnr7tPQ6WgaWSxCryMjK5sNfjzF0zi6quDvx8/MdGNGhnhKEU5hEOQKFTTBVkAaQkJpQoELZ292ZiH7NOfL3Lb6OLjoI/OabbzJixIh/7mUqrVQLuQZLKM7D2rCUFBNTuK3FjFX8dTOVgd9u55s/TzG4jS8/P9+RxvdULvQcxd2NcgQKm1BY4ZmpCuVHm93DE0H3MnPDSQ7laNmbQwjBN998Yz6tVCu5Bkuw9mGdeykJCre1GDGPjceu8NjMzZy4nMyswS35T2ggbs46C34Qxd2McgQKm1BU4ZkpRzG1TzOqebgwYcl+7mSYzyIC82mlp0+f1lauoSjMPawfe8z0DMWapSQrYhVZ2ZJPp/3IiHm7uefsCX5e/ja9D220yY+oqPgoR6CwCaYK0nJjylF4uzvz6VNBxF1JLrTQzIDZtFJTsstQOtWrph7Ww4bB/PmmZyjWLiVZEKu4lpzG0IiVfHHVlQEHf+enha9w36HdStNHYTHKEShsgqEgzcfNp8CxwiqUOzaoxogO/szbdpY/jl0u8j7500qPHTvGhEqVTA8urerV/A/rNWvMv/XbuEZh77kbPD5zC3sSJR+v+ZyPf52Ja2Z63nsqFEWgHIHCZoQFhnHt9WtEhkbi5+WHQFhUofxGj8Y0qVWZV5Ye4O/E1CLvkzuttG3btnz88cdlJtdgksLe+m1U6yClZOGOeAZ+ux0nR8GPka8xIHad5bYUFyUpXSFRHcoUNicsMMyqqmRXJx1fPN2S3rO28OLi/Swa3bZAn+P8DBgwACcnJ3r06IGbmxtHly/XL8uUh/7EhXUJy91LGfQ2W2nrnYwsJq04xI97L9C5UXVmDGyJ11eZ5m2xFdev65ebDLMdw5IXqCIxO0fNCBQlxpJCsqKoX92D9/s2Z9eZ60wz09EsP/369cPNLac4yhK5htKiqLd+w1JSq1ZW23r+egqhX21j+b4LTOjWgP8Nux8vdydtNX0Ms4AzZ5SkdAVFOQJFiTBVSFbczmahIXUY3MaXr6NP8duhv83er6ROR3M00teJPn6Fx2dt4cKNFP43rDUTujX8p0BMi3tGRUG1ajBkiOkZjgElKW33KEegKBG27mz27hNNCarrzStLDxB3JSnPMVs6Hasozrq4DfV1srMlMzecZMS83dTycmXV+I50aVxT03sa6x0SzCvFGlGS0naPcgSKEmHrzmYujjq+GRKCq5OO0fP3cOP2PyJzWrXTLJQybrWYmJLBswv28Nm6E/QNrs2K5zrg52MmS8qWmKp3MIWSlK4QKEegKBFadDar5eXGt8+04q+bdxgXFUN6pr5YTKt2moVShq0WD11M5PEvNrPp5FWm9mnGZwOCrKsStnYmk3t8YUtBBpSkdIVBOQJFidCqs1lr/6p8/GQLdpy+zqQVsUgpNW2naZYyaLUopeT7XecI/XobmVmSJWPaM/T0VkS9etY91K2ZyeQfXxju7hAZWfZBeYXNUI5AUSJydzaztG7AUvq2rM2Ebg34IeYCn6w9rmk7TbOUcqvFlPRMXll6gDeXx9K2XlVWj+9IyOY1hT/UTb35WzuTsXQpyMdHzQIqIKqOQFFirK0bsIYXuzbgSlIaX0Wf4q1e7ZndezaTNkziXOI5fL18iegaQVhgGNHR0Zrcn4iIvLnzoNm6+LFLt/hX1F5OX7vNS90a8nyXAH0LyaIe6qZy+8091IszwxFC7/jq1YNr16z7oRR2gXIEinKNEIL3+jTnxu103v/lKBH9OnJ2wtnSMyB3AZhGxWr6paDzTFl1GE9XJyJHtaVDQLV/BhT28DbnJHQ6yDIh5FfYDMdUXMDPT78EBKCVs1WUOWppSKEZtsr51zkIIEj6qgAAEH9JREFUZgwKpkvjGkxacYhFO0s5b13DVos3bqczZmEM/14RS5t6Vfn1xU55nQAU/vA25ySysqwrMFNN5u9qlCNQaIKtc/5dHHV8PSSELo1r8O8VsczdesbGFpc+sVcz6fH5JjYev8JbvZowf0Qbqnu6FBxY2EPanJMwZPRYWmCmmszf1ShHoNAELXL+Dc7g0WY1mbLqCJ+uPW6+f3E55nZaJm/9FMu0mDQquzqx4rkOjO50n/k2koU9pAtzEtbOZMqiybwSsSsXaOoIhBA9hBDHhRBxQoiJJo6/LIQ4IoQ4KITYIITw09IeRelRVM5/cZeNXBx1fBXWisFt6vLFxjhe/eFgkU1tLKKUHkhbTl7j0RmbiNp5jkf9HVk1viPNa3sVfaLhIb1woX77mWf0doL9vsmXcbGe4h80CxYLIXTAl0B34AKwWwjxs5TySK5h+4DWUsoUIcQ44GNgoFY2KUoPXy9f4hMLBh8lkmofVyMpPYn0LH3VsGHZCPQZSFGxUSYzgwzoHAQf9AvknspuTF9/gtPXkhl2X3aBe1mM4YGkoapmQnIaH6w5xo97L3BftUosHdOe22cP4upkZYGYKTtnz/4noGtPFJYNZQ+OrAKh5YygDRAnpTwtpUwHFgN9cg+QUm6UUhp+E3YAdTS0R1GKFNaxLCE1wegEDBiWjSyNLSw6tIjph7pz1fk/7Dt/mbe3pbDlZDFTGzWsHs7KlkTuiKfLtD/5+cBFnnu4Pmte7MT9/lXLlZ1lQhkU6ylMI7RaYxVCPAn0kFKOztl+BmgrpXzezPgvgEtSyvdNHAsHwgGqV6/eaunSpZrYrBXJycl4eHiUtRkWYyt7r6de52LSxQIP/cJw1jmbHO+scyawRqDxuvGJ8WRL/SzgWqoLG875cfWOoGc9Z/oGOOGsM7PeboqYGPPHWrWy/Dr5OJKQxaKjaVxIljSq4sDQZi7U9vjn3cvqz1kjOy3F5r/HsbGQbuJ3w9kZAgNtcgt7+9sD7Wzu3LlzjJSytaljWtYRmPpLNOl1hBBDgNbAQ6aOSylnA7MBGjVqJB9++GEbmVg6REdHY08229pehykOSNP/9Xnw8/LjXOI5k2MFguwB+ge//wz/AstO/2kwjW/irrLmTEeOJjnzQb9A2tcv2DbTJMOHF51DnxtD5a6ZuoL952/yydpjbI1LoE4VN74Ka0LP5vcgRN4/Cas/Z2vttDE2/z2+eNF0sd7s2WCj+9jb3x6Ujc1aLg1dAOrm2q4D/JV/kBCiGzAJeEJKmaahPYoywhItIINUhCV6QqYC0U4OknN8ROSotmRlSwb/dwfjImM4fTW5aANNZd4AJCcXDFyaCXDKyCi2xV1jyHc76fvlVo7+ncTbjzdl/csP8VhgrQJOoFhUtFx/lbJabtDSEewGGggh6gkhnIFBwM+5BwghWgLfoncCVzS0RVGGmIoXODk44ePmU0CfyBI9ocKcRccG1Vg74UEmdGvAnyeu8sj0Tby8dD9H/75l3kDDA8kn3wwiIaFgFku+dfokZzciGz5Ez403efq7nRy/nMTEno3Z9HpnRnWsZ10wuCjK44OzpNlWZZGyqiiAZktDUspMIcTzwFpAB8yRUh4WQkwF9kgpfwY+ATyAH3LemM5JKZ/QyiZF2WDI+CksE8iasRFdIwhfFZ6nTsFBOBidhZuzjgndGhLW1o+vouNYsvs8y/depE29qoS2rE3P5rX07R3z3DhM/5DP34glfxbLuXMkulRii38wvzTuxPqANqQ7OtPsUhwfbfiaPi8MxvWhbiX9yMwTFlZ+HpalkG2lKB00CxZrRaNGjeTx48fL2gyrsLd1SnuwN3+K6WeNPiO0Z6jJsTdT0lm06xzL9lzg9LXbOAgIqutNh/rVaF7biya1PKnl5Yazs2MBCeYUJxfiq9zLqXVbOHD+JnuXr2e/jz9ZDjp8bt+k99FN9D0STdDfJ/RBMSvW6+3hc85NAXv9/cs0ZmEJ9vYZg3Y2CyHKJFisUGhGfsXTwtRHvd2dee7hAMY9VJ+DFxJZv3QDm/fu46uzvmQ7/LN0U+XF73G/cxtddjaZOh2JLh7cdslZplq0D2dHB5r7+THuz594+PgOgv86jqPMV79QEVIfiwiGG1HpnxUG5QgUdw1CCII2/ULQO+G8kpJCqqMLJ6r5crx2Qy49PZwrmU6kbj1KVrbEMTsTrzu3qZqZit+AJ/Dv+ygNa3ri7OgAUckwbAHkdwJQev17LX1YF+e65pZ7atfOO9acYqnqYWx3KEeguLvIFex1y0wj6NJJgi6dhGuH9MsZUakmHrAD8l7D8MAtpT4FBdBybb6worV58/LuL8VeDQptUaJziruLopYzLM1isWUGj7WZN1pWGFuz3FMes5gUxUI5AsXdhS1bT9oi9dFUXcKQIVCtmnmHoOXavLWfj0r/rBAoR6C4uyhvRVnmegUnJOgVRoUoOEvQso9yeft8FKWCcgSKu4vytpxR2Fu8IZU1vzyzlg/r8vb5KEoF5QgUdx+luZxR1Pq/pW/xuWMAWj+s1XLPXYdyBAqFVljSeMWczpEpcs8e1MNaYUOUI1AotMKS7B5zOkemUPn5Co1QjkCh0ApLs3vCwuDaNYiM1C/zgH7JJzcqYKvQEOUIFBWfsmqQXtxUTCn1vYlVwFZRSihHoKjYlGWD9JJk96gYgKIUUY5AUbEpyz6/KhVTYScorSFFxaasFTLLU/8AhcIMakagqNhoWYWrUFQQlCNQVGyUZIJCUSTKESgqNmqdXqEoEhUjUFR81Dq9QlEoakagsG8MNQIxMaVbI6BQVCCUI1CUW6Jio/Cf4Y/DFAf8Z/gTFZvvIZ+7RgBKt0ZAoahAKEegKJdExUYRviqc+MR4JJL4xHjCV4XndQZlWSOgUFQglCNQlEsmbZhESkbeh3xKRgqTNuR6yJd1jYBCUUFQjkBRLjmXaPphnme/qhFQKGyCcgSKcomvl+mHeZ795bFGoKwE7hSKEqAcgaJcEtE1AnenvA95dyd3IrrmesjnrhGAsq8RKEuBO4WiBChHoCiXhAWGMbv3bPy89A95ndAZYwR5AsYGlc5WrYpW6dT6bV0FrxV2iiooU5RbwgL1D/XwVeHGwLEheyj3cYswvK0bHtSGt3Ww3QxCBa8VdoqaESjKNRZlD1l0oVJ4W1fBa4WdohyBolxjUfaQRRcqhbf1woLXKoisKMcoR6Ao11iUPWTRhUrhbd2cwB2YDiJfv/7/9u49RK7yDuP499G4LcUYm6TFujWJbRPoNtYqq9W0eEEpyZYaSkOJJCUpoaKtF9oiLRSs2H96xVZQwnrBWPDWCiWIouClltrVBKRRA5Z4axML1mJFsTUm+fWPc5adnZ3ZeXdnzmWY5wNDzsm8zHn2cM6857zve97p3bbNuuCKwGotafRQ0ge1uVofG+vtlXqrn5hs1yx14EB32zLrEVcEVmuNo4eEWL5oOeNfGZ9bRzG0vlrfsgV27Ch+uGe75qeDB3u7HbN58qghq71Np2ya+xd/yw9qmo56xYr2Hci9fBZh2bKpifEaDQ31bhtmXfAdgQ2usoZ7tmuWGh7u7XbM5qnQikDSWkkvSNon6Yct3v+ApHvy95+StKLIPGbTlDXcs10n8uLFvd2O2TwVVhFIOhq4EVgHjAAXSxppKrYNeDMiPgVcD/ysqDxmM5Q5V1GrTmSzmijyjuBMYF9EvBQRB4G7gfVNZdYDO/Ll3wMXSFKBmcym+PeMzYBiO4uHgX80rO8HPt+uTEQckvQWsAR4o7GQpEuAfD4A3pP0XCGJi7OUpr+p5votL/Qi86uvwubN2av1BhafCMPHwND7cPA1OPAGdPMwQL/t537LC87caHm7N4qsCFpd2cc8yhAR48A4gKTdETHafbzy9FvmfssLzlyGfssLzpyqyKah/cBJDesfB15rV0bSAmAR3V1hmZnZHBVZEewCVko6WdIQsBHY2VRmJ7AlX94APBoRM+4IzMysOIU1DeVt/pcDDwFHA7dFxPOSrgN2R8RO4Fbgt5L2kd0JbEz46PGiMheo3zL3W15w5jL0W15w5iTyBbiZ2WDzk8VmZgPOFYGZ2YCrbUXQb9NTJOT9nqS9kvZIekRS2zG9ZemUuaHcBkkhqfJheCmZJX0939fPS7qz7IxNWTodF8skPSbpmfzYGKsiZ0Oe2yS93u5ZHWVuyP+ePZJOLztji0ydMm/Ks+6R9KSkU8vO2CLTrJkbyp0h6bCkDYUGiojavcg6l18EPgEMAX8FRprKfBvYni9vBO6ped7zgQ/ly5dVmTc1c15uIfAEMAGM1j0zsBJ4Bvhwvv7RmucdBy7Ll0eAVyrex+cApwPPtXl/DHiQ7Bmgs4CnqsybmHlNw/Gwrh8yNxw/jwIPABuKzFPXO4J+m56iY96IeCwiJuc8niB7rqJKKfsY4CfAz4H/lRmujZTM3wJujIg3ASLi9ZIzNkrJG8Bx+fIiZj5rU6qIeILZn+VZD9wRmQngeEkfKydda50yR8STk8cD9Tj3UvYzwBXAfUDhx3BdK4JW01M0z9k7bXoKYHJ6iiqk5G20jeyqqkodM0s6DTgpIu4vM9gsUvbzKmCVpD9LmpC0trR0M6XkvRbYLGk/2ZXfFeVEm7e5Hut1U4dzryNJw8BXge1lbK+uP0zTs+kpSpKcRdJmYBQ4t9BEnc2aWdJRZDPCbi0rUIKU/byArHnoPLIrvz9JWh0R/yk4WyspeS8Gbo+IX0k6m+y5mtURcaT4ePNSp/NuTiSdT1YRfLHqLAl+DfwgIg6X0dBR14pgLtNT7K/B9BQpeZF0IfAj4NyIeK+kbO10yrwQWA08nh+IJwA7JV0UEbtLSzld6nExERHvAy9LeoGsYthVTsQZWTrl3QasBYiIv0j6INmkY1U2ac0m6VivG0mfBW4B1kXEv6vOk2AUuDs/95YCY5IORcQfCtla1Z0mbTpJFgAvAScz1cn2maYy32F6Z/G9Nc97GlnH4cqq929q5qbyj1N9Z3HKfl4L7MiXl5I1Yyypcd4Hga358qfJvlRV8X5eQfuO1y8zvbP46SqzJmZeBuwD1lSdMzVzU7nbKbizuJZ3BFHc9BRV5v0FcCzwu7yW/3tEXFTzzLWSmPkh4EuS9gKHgaujoivAxLzfB26W9F2yJpatkZ/9VZB0F1mz2tK83+LHwDEAEbGdrB9jjOyL9V3gm9UknZKQ+Rqy/sOb8nPvUFQ8I2lC5nLzVHjMmZlZDdR11JCZmZXEFYGZ2YBzRWBmNuBcEZiZDThXBGZmA66Ww0fN6kDSEuCRfPUEsuGo/8rX342INZUEM+sxDx81SyDpWuCdiPhl1VnMes1NQ2bzIOmd/N/zJP1R0r2S/ibpp/n8909LelbSJ/NyH5F0n6Rd+esL1f4FZlNcEZh171TgKuAU4BvAqog4k2xum8nZRH8DXB8RZwBfy98zqwX3EZh1b1dE/BNA0ovAw/n/P0v2g0QAFwIjDTNJHidpYUS8XWpSsxZcEZh1r3Em2SMN60eYOseOAs6OiP+WGcwshZuGzMrxMHD55Iqkz1WYxWwaVwRm5bgSGM1/QH0vcGnVgcwmefiomdmA8x2BmdmAc0VgZjbgXBGYmQ04VwRmZgPOFYGZ2YBzRWBmNuBcEZiZDbj/A/BzOcMLelzFAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "import numpy as np\n",
    "\n",
    "import matplotlib.pyplot as plot\n",
    "from sklearn.linear_model import LinearRegression\n",
    "\n",
    "def true_function(x):\n",
    "    x2 = (x*8) - 1\n",
    "    return ((np.sin(x2)/x2)*0.6)+0.3\n",
    "    \n",
    "#\n",
    "x_train  = np.arange(0, 0.6, 0.01)\n",
    "x_test  = np.arange(0.6, 1.1, 0.01)\n",
    "x_true = np.concatenate( (x_train, x_test) )\n",
    "\n",
    "#\n",
    "y_true_train = true_function(x_train)\n",
    "y_true_test = true_function(x_test)\n",
    "y_true   = np.concatenate( (y_true_train, y_true_test) )\n",
    "\n",
    "#\n",
    "y_train = y_true_train + (np.random.rand(*x_train.shape)-0.5)*0.4\n",
    "y_test  = y_true_test + (np.random.rand(*x_test.shape)-0.5)*0.4\n",
    "\n",
    "#\n",
    "lr_x_train = x_train.reshape((x_train.shape[0],1))\n",
    "reg = LinearRegression().fit(lr_x_train, y_train)\n",
    "reg_pred = reg.predict(lr_x_train)\n",
    "print(reg.coef_[0])\n",
    "print(reg.intercept_)\n",
    "\n",
    "#\n",
    "plot.xlim([0,1.5])\n",
    "plot.ylim([0,1])\n",
    "l1 = plot.scatter(x_train, y_train, c=\"g\", label=\"Training Data\")\n",
    "l2 = plot.scatter(x_test, y_test, c=\"r\", label=\"Testing Data\")\n",
    "l3, = plot.plot(lr_x_train, reg_pred, color='black', linewidth=3, \n",
    "                label=\"Trained Model\")\n",
    "l4, = plot.plot(x_true, y_true, label = \"True Function\")\n",
    "plot.legend(handles=[l1, l2, l3, l4])\n",
    "\n",
    "#\n",
    "plot.title('Drift')\n",
    "plot.xlabel('Time')\n",
    "plot.ylabel('Sales')\n",
    "plot.grid(True, which='both')\n",
    "plot.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The \"True function\" represents what the data does over time. Unfortunately, you only have the training portion of the data. Your model will do quite well on the data that you trained it with; however, it will be very inaccurate on the new test data presented. The prediction line for the model fits the training data well but does not fit the est data well.\n",
    "\n",
    "## Preprocessing the Sberbank Russian Housing Market Data\n",
    "\n",
    "The examples provided in this section use a Kaggle dataset named The Sberbank Russian Housing Market, which you can access from the following link.\n",
    "\n",
    "* [Sberbank Russian Housing Market](https://www.kaggle.com/c/sberbank-russian-housing-market/data)\n",
    "\n",
    "Because Kaggle provides datasets as training and test, we must load both of these files."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "from sklearn.preprocessing import LabelEncoder\n",
    "\n",
    "PATH = \"/Users/jheaton/Downloads/sberbank-russian-housing-market\"\n",
    "\n",
    "\n",
    "train_df = pd.read_csv(os.path.join(PATH,\"train.csv\"))\n",
    "test_df = pd.read_csv(os.path.join(PATH,\"test.csv\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "I provide a simple preprocess function that converts all numerics to z-scores and all categoricals to dummies."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "def preprocess(df):\n",
    "    for i in df.columns:\n",
    "        if df[i].dtype == 'object':\n",
    "            df[i] = df[i].fillna(df[i].mode().iloc[0])\n",
    "        elif (df[i].dtype == 'int' or df[i].dtype == 'float'):\n",
    "            df[i] = df[i].fillna(np.nanmedian(df[i]))\n",
    "\n",
    "    enc = LabelEncoder()\n",
    "    for i in df.columns:\n",
    "        if (df[i].dtype == 'object'):\n",
    "            df[i] = enc.fit_transform(df[i].astype('str'))\n",
    "            df[i] = df[i].astype('object')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we run the training and test datasets through the preprocessing function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "preprocess(train_df)\n",
    "preprocess(test_df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, we remove thr target variable.  We are only looking for drift on the $x$ (input data)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "train_df.drop('price_doc',axis=1,inplace=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## KS-Statistic\n",
    "\n",
    "We will use the KS-Statistic to determine the difference in distribution between columns in the training and test sets. As a baseline, consider if we compare the same field to itself. In this case, we are comparing the **kitch_sq** in the training set. Because there is no difference in distribution between a field in itself, the p-value is 1.0, and the KS-Statistic statistic is 0. The P-Value is the probability of no difference between the two distributions. Typically some lower threshold is used for how low a P-Value is needed to reject the null hypothesis and assume there is a difference. The value of 0.05 is a standard threshold for p-values. Because the p-value is NOT below 0.05, we expect the two distributions to be the same. If the p-value were below the threshold, the **statistic** value becomes interesting. This value tells you how different the two distributions are. A value of 0.0, in this case, means no differences. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Ks_2sampResult(statistic=-0.0, pvalue=1.0)"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from scipy import stats\n",
    "\n",
    "stats.ks_2samp(train_df['kitch_sq'], train_df['kitch_sq'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's do something more interesting.  We will compare the same field **kitch_sq** between the test and training sets.  In this case, the p-value is below 0.05, so the **statistic** value now contains the amount of difference detected."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Ks_2sampResult(statistic=0.25829078867676714, pvalue=0.0)"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "stats.ks_2samp(train_df['kitch_sq'], test_df['kitch_sq'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we pull the KS-Stat for every field.  We also establish a boundary for the maximum p-value and how much of a difference is needed before we display the column."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "id: Ks_2sampResult(statistic=1.0, pvalue=0.0)\n",
      "timestamp: Ks_2sampResult(statistic=0.8982081426022823, pvalue=0.0)\n",
      "life_sq: Ks_2sampResult(statistic=0.2255084471628891, pvalue=7.29401465948424e-271)\n",
      "max_floor: Ks_2sampResult(statistic=0.17313454154786817, pvalue=7.82000315371674e-160)\n",
      "build_year: Ks_2sampResult(statistic=0.3176883950430345, pvalue=0.0)\n",
      "num_room: Ks_2sampResult(statistic=0.1226755470309048, pvalue=1.8622542043144584e-80)\n",
      "kitch_sq: Ks_2sampResult(statistic=0.25829078867676714, pvalue=0.0)\n",
      "state: Ks_2sampResult(statistic=0.13641341252952505, pvalue=2.1968159319271184e-99)\n",
      "preschool_quota: Ks_2sampResult(statistic=0.2364160801236304, pvalue=1.1710777340471466e-297)\n",
      "school_quota: Ks_2sampResult(statistic=0.25657342859882415, pvalue=0.0)\n",
      "raion_build_count_with_material_info: Ks_2sampResult(statistic=0.19083554469945835, pvalue=4.2830715478540455e-194)\n",
      "build_count_block: Ks_2sampResult(statistic=0.2085099875571384, pvalue=1.2494725963839073e-231)\n",
      "build_count_wood: Ks_2sampResult(statistic=0.20880892257287548, pvalue=2.7121209564049356e-232)\n",
      "build_count_brick: Ks_2sampResult(statistic=0.20707302040295733, pvalue=1.8727526843617417e-228)\n",
      "build_count_monolith: Ks_2sampResult(statistic=0.1779053203005685, pvalue=9.707337049927237e-169)\n",
      "raion_build_count_with_builddate_info: Ks_2sampResult(statistic=0.19083554469945835, pvalue=4.2830715478540455e-194)\n",
      "build_count_1946-1970: Ks_2sampResult(statistic=0.2120848849003817, pvalue=1.2611967894038327e-239)\n",
      "build_count_1971-1995: Ks_2sampResult(statistic=0.19318717795964874, pvalue=6.749018351478723e-199)\n",
      "build_count_after_1995: Ks_2sampResult(statistic=0.1899788049625577, pvalue=2.326885447792135e-192)\n",
      "cafe_sum_500_min_price_avg: Ks_2sampResult(statistic=0.43416513954613944, pvalue=0.0)\n",
      "cafe_avg_price_500: Ks_2sampResult(statistic=0.4337483600913839, pvalue=0.0)\n",
      "cafe_sum_1000_min_price_avg: Ks_2sampResult(statistic=0.20459725920896277, pvalue=4.926006714051654e-223)\n",
      "cafe_sum_1000_max_price_avg: Ks_2sampResult(statistic=0.2025042643599122, pvalue=1.673760717424549e-218)\n",
      "cafe_avg_price_1000: Ks_2sampResult(statistic=0.20399591976318965, pvalue=9.979898113798837e-222)\n",
      "cafe_sum_1500_min_price_avg: Ks_2sampResult(statistic=0.13264580849650315, pvalue=5.402810864486807e-94)\n",
      "cafe_avg_price_1500: Ks_2sampResult(statistic=0.13316992134991978, pvalue=9.81160594279586e-95)\n",
      "cafe_sum_2000_min_price_avg: Ks_2sampResult(statistic=0.10808898569668424, pvalue=1.4808059430311726e-62)\n",
      "cafe_sum_2000_max_price_avg: Ks_2sampResult(statistic=0.10732529051140638, pvalue=1.1100804327460878e-61)\n",
      "cafe_avg_price_2000: Ks_2sampResult(statistic=0.1081218037860151, pvalue=1.3575759911857293e-62)\n"
     ]
    }
   ],
   "source": [
    "for col in train_df.columns:\n",
    "    ks = stats.ks_2samp(train_df[col], test_df[col])\n",
    "    if ks.pvalue < 0.05 and ks.statistic>0.1:\n",
    "        print(f'{col}: {ks}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Detecting Drift between Training and Testing Datasets by Training\n",
    "\n",
    "Sample the training and test into smaller sets to train.  We want 10K elements from each; however, the test set only has 7,662, so we only sample that amount from each side."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "7662\n"
     ]
    }
   ],
   "source": [
    "SAMPLE_SIZE = min(len(train_df),len(test_df))\n",
    "SAMPLE_SIZE = min(SAMPLE_SIZE,10000)\n",
    "print(SAMPLE_SIZE)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We take the random samples from the training and test sets and add a flag called **source_training** to tell the two apart."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "training_sample = train_df.sample(SAMPLE_SIZE, random_state=49)\n",
    "testing_sample = test_df.sample(SAMPLE_SIZE, random_state=48)\n",
    "\n",
    "# Is the data from the training set?\n",
    "training_sample['source_training'] = 1\n",
    "testing_sample['source_training'] = 0"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we combine the data that we sampled from the training and test data sets and shuffle them."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Build combined training set\n",
    "combined = testing_sample.append(training_sample)\n",
    "combined.reset_index(inplace=True, drop=True)\n",
    "\n",
    "# Now randomize\n",
    "combined = combined.reindex(np.random.permutation(combined.index))\n",
    "combined.reset_index(inplace=True, drop=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will now generate $x$ and $y$ to train.  We attempt to predict the **source_training** value as $y$, which indicates if the data came from the training or test set.  If the model successfully uses the data to predict if it came from training or testing, then there is likely drift.  Ideally, the train and test data should be indistinguishable.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([1, 1, 1, ..., 1, 0, 0])"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Get ready to train\n",
    "y = combined['source_training'].values\n",
    "combined.drop('source_training',axis=1,inplace=True)\n",
    "x = combined.values\n",
    "\n",
    "y"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will consider anything above a 0.75 AUC as having a good chance of drift."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "id 1.0\n",
      "timestamp 0.9601862111975688\n",
      "full_sq 0.7966785611424911\n",
      "life_sq 0.8724218330166038\n",
      "build_year 0.8004825176688191\n",
      "kitch_sq 0.9070093804672634\n",
      "cafe_sum_500_min_price_avg 0.8435920036035689\n",
      "cafe_avg_price_500 0.8453533835344671\n"
     ]
    }
   ],
   "source": [
    "from sklearn.ensemble import RandomForestClassifier\n",
    "from sklearn.model_selection import cross_val_score\n",
    "\n",
    "model = RandomForestClassifier(n_estimators = 60, max_depth = 7,\n",
    "    min_samples_leaf = 5)\n",
    "lst = []\n",
    "\n",
    "for i in combined.columns:\n",
    "    score = cross_val_score(model,pd.DataFrame(combined[i]),y,cv=2,\n",
    "                            scoring='roc_auc')\n",
    "    if (np.mean(score) > 0.75):\n",
    "        lst.append(i)\n",
    "        print(i,np.mean(score))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3.9 (tensorflow)",
   "language": "python",
   "name": "tensorflow"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.7"
  },
  "varInspector": {
   "cols": {
    "lenName": 16,
    "lenType": 16,
    "lenVar": 40
   },
   "kernels_config": {
    "python": {
     "delete_cmd_postfix": "",
     "delete_cmd_prefix": "del ",
     "library": "var_list.py",
     "varRefreshCmd": "print(var_dic_list())"
    },
    "r": {
     "delete_cmd_postfix": ") ",
     "delete_cmd_prefix": "rm(",
     "library": "var_list.r",
     "varRefreshCmd": "cat(var_dic_list()) "
    }
   },
   "types_to_exclude": [
    "module",
    "function",
    "builtin_function_or_method",
    "instance",
    "_Feature"
   ],
   "window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
